regardless of what kind of analysis you use.
Dealing with missing data
Most clinical trials have incomplete data for one or more variables, which can be a real headache
when analyzing your data. The statistical aspects of missing data are quite complicated, so you should
consult a statistician if you have more than just occasional, isolated missing values. Here we describe
some commonly used approaches for coping with missing data:
Exclusion: Exclude a case from an analysis if any of the required variables for that analysis is
missing. This seems simple, but the downside to this approach is it can reduce the number of
analyzable cases, sometimes quite severely. And if the result is missing for a reason that’s related
to treatment efficacy, excluding the case can bias your results.
Imputation: Imputation is where you replace a missing value with a value you impute, or
create yourself. When analysts impute in a clinical trial, they typically take the mean or median of
all the available values for that variable and fill that in for the missing variable. In reality, you
have to keep the original variable, and then save a separate, imputed variable so that you can
document the type of imputation applied. There are a lot of downsides to imputation. If you are
imputing a small number of values, it’s not worth it because it adds bias. You may as well just
exclude those cases. But if you impute a large number of values, you are basically making up the
data yourself, adding more bias.
Last Observation Carried Forward (LOCF): LOCF is a special case of imputation. Sometimes
during follow-up, one of a series of sequential measurements on a particular participant is missing.
For example, imagine that there were supposed to be four weekly glucose values measured, and
you were missing a measurement only on week three. In that case, you could use the most recent
previous value in the series, which is the week two measurement, to impute the week three
measurement. This technique is called Last Observation Carried Forward (LOCF) and is one of
the most widely used strategies. Although imputation adds bias, LOCF adds bias in the
conservative direction, making it more difficult to demonstrate efficacy. This approach is popular
with regulators, who want to put the burden of proof on the drug and study sponsor.
Handling multiplicity
Every time you perform a statistical significance test, you run a chance of being fooled by random
fluctuations into thinking that some real effect is present in your data when, in fact, none exists (review
Chapter 3 for a refresher on statistical testing). If you declare the results of the test are statistically
significant, and in reality they are not, you are committing Type I error. When you say that you require
p < 0.05 to declare statistical significance, you’re testing at the 0.05 (or 5 percent) alpha (α) level.
This is another way of saying that you want to limit your Type I error rate to 5 percent. But that 5
percent error rate applies to each and every statistical test you run. The more analyses you perform on
a data set, the more your overall α level increases. If you perform two tests at α = 0.05, your chance of
at least one of them coming out falsely significant is about 10 percent. If you run 40 tests, the overall α
level jumps to 87 percent! This is referred to as the problem of multiplicity, or as Type I error
inflation.